<h1 align="center"> GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching



# Introduction
<figure>
<img src="figs/framework.png">
<figcaption align = "center"><b>Figure 1: The overall architecture of GoMatching.</b></figcaption>
</figure>

1. We identify a main bottleneck in the state-of-the-art video text spotter: the limited recognition capability. In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching.
2. We introduce a rescoring mechanism and long-short term matching module to adapt image text spotter to video datasets and enhance the tracker's capabilities.
3. We establish the ArTVideo test set for addressing the absence of curved texts in current video text spotting datasets and evaluating the performance of video text spotters on videos with arbitrary-shaped text. ArTVideo contains 20 video clips, featuring 30% curved text approximately.
4. GoMatching only requires 3 hours training on one Nvidia RTX 3090 GPU for ICDAR15-video. For video text spotting task, GoMatching achieves 70.52 MOTA on ICDAR15-video, setting a new record on the leaderboard. We reveal the probability of freezing off-the-shelf ITS part and focusing on tracking, thereby saving training budgets while reaching SOTA performance. 




# Usage
### Dataset

Videos in ICDAR15-video and DSText should be extracted into frames. The prepared Data organization is as follows:

```
|- ./datasets
		|--- ICDAR15
		|      |--- frame
		|            |--- Video_10_1_1
		|                       |--- 1.jpg
		|                       └---  ...
		|			 └--- ...
		|      |--- frame_test
		|				|--- Video_11_4_1
		|						|--- 1.jpg
		|                       └---  ...
		|               └--- ...
		|      |--- vts_train.json
		|      └--- vts_test_wo_anno.json
		|
		|--- DSText
		|      |--- frame
		|            |--- Activity
		|            		|--- Video_163_6_3
		|                       		|--- 1.jpg
		|                       		└---  ...
		|                   └--- ...
		|			 └--- ...
		|      |--- frame_test
		|				|--- Activity
		|            		|--- Video_162_6_2
		|                       		|--- 1.jpg
		|                       		└---  ...
		|                   └--- ...
		|			 	└--- ...
		|      |--- vts_train.json
		|      └--- vts_test_wo_anno.json
		|---BOVText
		|      |--- frame
		|            |--- Cls1_Livestreaming
		|            		|--- Cls1_Livestreaming_video1
		|                       		|--- 1.jpg
		|                       		└---  ...
		|                   └--- ...
		|			 └--- ...
		|      |--- frame_test
		|				|--- Cls1_Livestreaming
		|            		|--- Cls1_Livestreaming_video5
		|                       		|--- 1.jpg
		|                       		└---  ...
		|                   └--- ...
		|			 	└--- ...
		|      |--- Test
		|				|--- Cls1_Livestreaming
		|            		|--- Cls1_Livestreaming_video5.json
		|                   └--- ...
		|			 	└--- ...
		|      └--- vts_train.json
		|--- ArTVideo
		|      |--- frame
		|            |--- video_1
		|                    |--- 1.jpg
		|                    └---  ...
		|			 └--- ...
		|      |--- json
		|            |--- video_1.json
		|			 └--- ...
		|      |--- video
		|            |--- video_1.mp4
		|			 └--- ...
```

### Installation

Python_3.8 + PyTorch_1.9.0 + CUDA_11.1 + Detectron2_v0.6

```python
conda create -n gomatching python=3.8 -y
conda activate gomatching
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
cd third_party
python setup.py build develop
```

### Train

ICDAR15

```python
python train_net.py --num-gpus 1 --config-file configs/GoMatching_ICDAR15.yaml
```

DSText

```python
python train_net.py --num-gpus 1 --config-file configs/GoMatching_DSText.yaml
```

BOVText

```python
python train_net.py --num-gpus 1 --config-file configs/GoMatching_BOVText.yaml
```

### Evaluation

**ICDAR15**

```python
python eval.py --config-file configs/GoMatching_ICDAR15.yaml --input ./datasets/ICDAR15/frame_test/ --output output/icdar15 --opts MODEL.WEIGHTS trained_models/ICDAR15/xxx.pth

cd output/icdar15/preds
zip -r ../preds.zip ./*
```

Then you can submit the `zip` file to the [official websit](https://rrc.cvc.uab.es/?ch=3&com=evaluation&task=4) for evaluation.

**DSText**

```python
python eval.py --config-file configs/GoMatching_DSText.yaml --input ./datasets/DSText/frame_test/ --output output/dstext --opts MODEL.WEIGHTS trained_models/DSText/xxx.pth

cd output/dstext/preds
zip -r ../preds.zip ./*
```

Then you can submit the `zip` file to the [official websit](https://rrc.cvc.uab.es/?ch=22&com=evaluation&task=2) for evaluation.

**BOVText**

```
python eval.py --config-file configs/GoMatching_BOVText.yaml --input ./datasets/BOVText/frame_test/ --output output/bovtext --opts MODEL.WEIGHTS trained_models/BOVText/xxx.pth

### evaluation
# 1. eval tracking
python tools/Evaluation_Protocol_BOV_Text/Task1_VideoTextTracking/evaluation.py --groundtruths ./datasets/BOVText/Test/test_annotation --tests output/bovtext/jsons/

# 2. eval spotting
python tools/Evaluation_Protocol_BOV_Text/Task2_VideoTextSpottinging/evaluation.py --groundtruths ./datasets/BOVText/Test/test_annotation --tests output/bovtext/jsons/

```

**ArTVideo**

```
python eval.py --config-file configs/GoMatching_Eval_ArTVideo.yaml --input ./datasets/ArTVideo/frame/ --output output/artvideo --opts MODEL.WEIGHTS trained_models/ICDAR15/xxx.pth

### evaluation
# 1. eval tracking on straight and curve text
python tools/Evaluation_Protocol_ArtVideo/eval_trk.py --groundtruths ./datasets/ArTVideo/json/ --tests output/artvideo/jsons/

# 2. eval tracking on curve text only
python tools/Evaluation_Protocol_ArtVideo/eval_trk.py --groundtruths ./datasets/ArTVideo/json/ --tests output/artvideo/jsons/ --curve

# 3. eval spotting on straight and curve text
python tools/Evaluation_Protocol_ArtVideo/eval_e2e.py --groundtruths ./datasets/ArTVideo/json/ --tests output/artvideo/jsons/

# 4. eval spotting on curve text only
python tools/Evaluation_Protocol_ArtVideo/eval_e2e.py --groundtruths ./datasets/ArTVideo/json/ --tests output/artvideo/jsons/ --curve
```

**Note:** If you want to visualize the results, you can add `--show` argument as follow:

```python
python eval.py --config-file configs/GoMatching_ICDAR15.yaml --input ./datasets/ICDAR15/frame_test/ --output output/icdar15 --show --opts MODEL.WEIGHTS trained_models/ICDAR15/xxx.pth
```

